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1  Summary 

The  purpose  of  this  program  is  to  address  the  development  of  algorithms  for  adaptive 
processing  of  multi-sensor  data,  employing  feedback  to  optimize  the  linkage  between 
observed  data  and  sensor  control.  The  envisioned  multi-modal  adaptive  system  is 
applicable  for  intelligence,  surveillance,  and  reconnaissance  (ISR)  in  general 
environments,  addressing  base  and  port  security,  as  well  as  urban  and  suburban  sensing 
during  wartime  and  peace-keeping  operations.  Of  significant  importance  for  current  and 
anticipated  DoD  activities,  the  ISR  system  is  designed  to  detect  asymmetric  threats,  with 
the  goal  of  recognizing  unusual  behavior  or  activities.  Technologies  and  systems 
developed  under  this  effort  will  be  designed  for  semi-automated  scene  awareness,  with 
the  objective  of  recognizing  behavior  that  appears  atypical  (e.g.  atypical  object  motion, 
and  dynamic  characteristics  of  people  and  vehicles).  Leveraging  our  previously 
developed  technology,  SIG  is  developing  second-generation  methods  to  adaptively  leam 
the  statistics  of  dynamic  object  behavior  in  video,  while  focusing  on  defining  system 
requirements  for  sensor  deployment  by  using  field  data  (vs.  highly  controlled  indoor 
data).  SIG  is  also  working  closely  with  its  subcontractor,  Lockheed  Martin,  to  integrate 
additional  technologies,  such  as  object  classification  and  recognition,  to  provide  a  more 
robust  and  discriminative  system.  Additionally,  SIG  is  cooperating  with  the  Navy’s 
China  Lake  facility  to  collect  representative  data  for  a  deployed  system,  and  to  specify 
requirements  and  features  necessary  of  such  a  system.  Finally,  SIG  is  coordinating  with 
Integrian  on  prototype  development  schedules  and  product  integration  requirements,  and 
defining  a  joint  marketing  and  commercialization  strategy  for  such  products. 

2  Technical  Developments 

Over  the  course  of  the  past  two  months,  significant  progress  has  been  made  towards 
adding  the  final  features  necessary  for  the  video  tracking  system.  The  code  has 
successfully  transitioned  into  its  final  format  in  C  for  efficient  implementation,  allow  us 
to  perfonn  significant  optimizations  and  achieve  very  efficient  running  times,  on  the 
order  of  35  fps.  We  have  also  added  important  capabilities  for  object  tracking  across 
multiple  cameras,  object  classification  (allowing  behavior  analysis  to  be  conditioned  on 
the  type  of  object  observed),  and  virtual  pan/tilt/zoom  capabilities. 

2.1  C  code  and  Optimizations 

Finishing  the  work  started  last  reporting  period,  SIG  has  completed  the  transition  of  the 
code  from  the  research  rapid  development  environment  in  Matlab  into  the  more  efficient 
and  flexible  C  code,  which  will  be  the  format  for  the  final  delivered  code.  The  final 
development  in  C  code  allows  significant  benefits  for  the  system,  including  efficiency, 
flexibility,  and  portability.  It  also  allows  direct  access  to  a  variety  of  different  types  of 
camera  video  streams,  capable  of  utilizing  the  camera  developer’s  own  highly  efficient 
camera  interface  libraries  for  data  input  and  camera  control. 

One  of  the  primary  benefits  of  a  C  implementation  is  the  increased  memory  and 
processing  efficiency  possible  in  a  lower  level  compiled  system.  Significant  time  has 
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been  spent  optimizing  the  tracking  code,  and  we  have  achieved  highly  successful  results. 
Intentionally  testing  on  an  older,  lowered  powered  system,  we  used  a  1 .7  GHz  Pentium 
M  processor,  which  is  more  indicative  of  the  anticipated  system  that  would  be  readily 
available  for  a  complete  anomalous  behavior  detection  system.  On  our  benchmark 
testing  sequences,  including  up  to  four  objects  moving  simultaneously  on  screen,  we  are 
able  to  achieve  speeds  of  35  frames  per  second,  well  in  excess  of  our  targeted  15  fps. 

This  provides  plenty  of  leeway  for  inclusion  of  additional  processing  to  handle  the  more 
advanced  features  we  anticipate  including  over  the  next  couple  of  months. 

2.2  Multi-camera  Tracking 

To  maximally  leverage  the  power  of  a  multi-camera  system,  it  is  critical  to  be  able  to 
track  objects  as  they  pass  from  the  field  of  view  of  one  camera  to  another.  While 
tracking  each  object  separately  for  each  camera  may  be  possible,  the  data  association 
problem  can  be  difficult,  as  many  viewing  conditions  can  differ  between  the  two 
cameras,  such  as  viewing  angle,  distance,  lighting  conditions,  and  occlusions. 

Our  proposed  solution  specifically  seeks  to  create  observation  models  which  take  these 
viewing  conditions  into  account  to  create  a  probabilistic  set  of  associations.  Specifically, 
we  consider  the  joint  probability  of  association,  which  allows  all  of  the  observable  data  to 
be  treated  in  a  relative  fashion.  Given  that  an  observed  pair  of  people  in  one  camera 
could  be  associated  with  two  other  people  in  a  second  camera,  it  can  be  challenging  to 
accurately  model  the  probability  of  association  for  each  one  independently.  However, 
when  considering  the  joint  probability,  the  factors  which  are  most  relevant  become  the 
relative  differences.  For  instance,  different  viewing  angles  can  make  modeling  the 
expected  position  for  an  object  in  the  second  camera  very  difficult,  the  joint  probability  is 
able  to  nonnalize  out  much  of  this  noise,  and  instead  focus  on  preserving  the  relative 
positions  of  the  different  objects. 

Initial  work  has  already  been  completed  to  implement  this  approach,  and  has  been  tested 
on  our  data.  The  results  have  been  very  encouraging,  and  as  this  technology  matures,  it 
will  be  integrated  with  the  behavior  modeling  and  analysis  component  of  the  system. 

This  will  provide  significantly  greater  amounts  of  information  for  the  system  to  work 
with,  and  be  able  to  analyze  potentially  suspicious  behavior  across  a  significantly  broader 
period  of  time,  providing  a  corresponding  increase  in  the  discriminative  power  of  the 
system. 

2.3  Object  Classification 

In  the  recent  reporting  period,  we  have  implemented  algorithms  to  perfonn  classification 
of  tracked  foreground  objects  into  one  of  multiple  predefined  object  classes.  This 
classification  mechanism  is  based  on  shape  characteristics  as  captured  by  the 
probabilistic  shape  models  maintained  internal  to  the  Bayesian  tracking  engine.  To 
reduce  the  complexity  of  the  classification,  we  distill  the  shape  model  into  a  number  of 
easily  computed  characteristics,  such  as  height  and  width.  The  classification  detennined 
from  each  of  these  individual  characteristics  can  be  combined  in  a  principled  manner  to 
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achieve  accurate  distributions.  This  concept  has  been  applied  to  classification  in  many 
different  ways  before,  such  Ada-boost  or  bootstrapping  methods.  We  have  taken  these 
ideas,  and  updated  them  to  be  included  in  our  Bayesian  framework.  This  provides  a 
principled  way  of  conditioning  the  expected  behavior  of  the  objects  on  their  appearance. 

p(pose  |  models,  shape)  =  Zciassp(pose  \  class,  models)  p(class  \  shape) 

The  initial  classifier  perfonnance  is  encouraging  and  further  refinements  are  being  made 
based  on  additional  data  sets.  The  classifier  is  able  to  make  decisions  not  just  about 
individual  shape  classes  (e.g.  pedestrian  versus  vehicle),  but  also  identifying  other 
features  such  as  presence  of  multiple  pedestrians  in  a  group  that  might  otherwise  be 
difficult  to  discern  based  on  simple  features  such  as  velocity  or  position  alone. 

2.4  Virtual  Pan/Tilt/Zoom 

One  of  the  objectives  of  the  current  project  work  is  to  use  the  information  extracted  from 
the  video  detection  and  tracking  engine  to  perform  not  only  behavior  analysis,  but  to  also 
perform  active  sensor  management  that  can  enhance  the  information  gained  from  objects 
identified  by  the  tracking  system.  Camera  technology  trends  and  our  own  analysis  results 
indicate  that  a  promising  approach  to  implement  a  Sensor  Management  Agent  (SMA) 
would  be  to  use  it  to  create  a  ‘virtual’  zoom  and  pan  functions  that  will  be  used  in 
conjunction  with  high  resolution  digital  video  cameras  to  achieve  the  same  effect  with 
much  better  control  and  flexibility. 

For  typical  video  scenes,  video  analytic  processing  and  accurate  object  tracking  can  be 
performed  at  lower  resolution  without  loss  in  robustness  but  with  a  savings  in 
computational  load  as  well  as  internal  data  communications  bandwidth.  When  objects  of 
interest  are  detected  in  the  video  scene,  it  is  often  useful  to  create  a  ‘virtual  zoom’  camera 
by  processing  the  area  of  interest  at  higher  resolution  beyond  the  level  that  is  needed  to 
simply  maintain  tracking  of  the  object.  As  the  object  moves,  the  system  can  track  its 
motion  and  ensure  that  the  high  resolution  ‘virtual  zoom’  area  is  panned  with  the  moving 
object  of  interest  to  support  extraction  of  object  features  and/or  display  at  higher 
resolution. 

We  have  created  the  basic  framework  necessary  for  this  approach  of  SMA  in  our  current 
implementation.  The  method  is  based  on  identifying  regions  of  interest,  as  specified  by 
the  tracking  algorithm.  Since  moving  objects  are  the  pertinent  portions  of  the  image,  it  is 
intuitive  that  it  is  these  sections  which  are  most  informative  to  concentrate,  or  “zoom”  in 
on.  The  tracking  algorithm  provides  both  a  predictive  and  a  posterior  estimation  of  the 
objects,  allowing  for  a  consistent  “pan/tilt”  mechanism,  to  provide  continuity  for  a  single 
virtual  zoom.  These  areas  can  then  be  extracted  and  sent  along  either  for  more 
processing,  or  along  a  narrow  bandwidth  connection  for  visualization  purposes.  This 
framework  provides  visualizations  that  demonstrate  a  virtual  pan/zoom  function  based  on 
tracked  objects  as  well  as  necessary  features  to  support  acoustic  fusion  as  discussed 
below. 
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3  Acoustic  Array 

In  addition  to  efficient  extraction  of  visual  information,  this  same  virtual  pan/zoom 
framework  will  be  useful  as  we  extend  the  same  concept  to  additional  types  of  sensors  or 
processing  regimes  that  are  controlled  by  the  low  level  object  detection  and  tracking 
engine  to  extract  even  more  information  about  the  objects  of  interest  that  are  being 
examined  in  the  process  of  asymmetric  threat  detection. 

In  addition  to  our  video  work,  SIG  has  moved  forward  with  the  internal  development 
project  (not  developed  on  SEALS  funding)  to  build  a  linear  acoustic  array  that  can  collect 
acoustic  data  synchronized  with  color  video  data  to  create  a  real-time  acoustic  beam  that 
can  listen  to  a  stationary  or  moving  object  under  the  control  of  the  video  tracking  system. 
SIG  has  completed  initial  analysis  of  this  acoustic  array  sensor  and  we  are  proceeding 
with  data  collection  efforts  as  we  work  to  fuse  this  sensor  data  with  the  ongoing  video 
tracking  effort.  Figure  2  shows  results  of  modeling  this  sensor  to  determine  potential 
acoustic  gain  available  to  extract  acoustic  information  in  a  multi-sensor  tracking 
application. 


Figure  2:  Initial  results  for  antenna  array  gain  pattern  modeling  showing  that  significant  azimuth  gain  can 
be  achieved  with  a  linear  array,  as  well  as  potential  ability  to  null  nearby  acoustics  sources. 

4  Future  Directions 

During  the  next  reporting  period,  our  goal  is  to  finish  integrating  more  advanced  methods 
into  the  tracking  system  to  allow  us  to  achieve  the  highest  levels  of  discrimination  in  the 
most  challenging  situations  of  the  our  data  collections.  We  will  focus  specifically  on 
apparent  object  separation  and  merging  instances,  for  example,  when  a  person  exits  or 
enters  a  vehicle,  or  when  they  drop  off  or  pick  up  a  package.  Another  area  of 
improvement  will  consider  the  challenges  that  can  arise  when  objects  remain  on  screen 
for  an  extended  period  of  time,  or  loiter  in  place  for  a  long  time.  This  situation  has 
particularly  strong  implications  for  the  color  models  used,  which  are  currently  being 
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examined.  The  more  recent  features  of  the  tracking  algorithm,  such  as  object 
classification  and  trans-camera  tracking,  will  be  tightly  integrated  into  the  behavior 
analysis  section  of  the  system,  to  provide  more  discriminative  and  informative  results. 

In  addition  to  this  near-term  work,  we  are  continuing  to  plan  our  efforts  for  the  Year  III 
research  that  ONR  has  indicated  will  be  funded.  One  key  aspect  of  this  next  year,  as 
indicated  previously,  is  to  work  toward  transitioning  the  research  to  other  efforts,  such  as 
the  DARPA  LACOSTE.  In  this  effort  and  others,  an  important  component  of  the  sensor 
framework  involves  compressive-sensing  and  other  related  concepts  to  address  the  very 
high  envisioned  data  rates.  This  compressive-sensing  construct  involves  non-adaptive 
random  sampling,  while  related  constructs  such  as  value-of-infonnation  sensor 
management  algorithms  are  adaptive  and  non-random.  Given  ONR’s  interest  in 
compressive  sensing,  within  the  final  year  of  this  program  these  two  approaches  will  be 
investigated  in  the  context  of  high-bit-rate  video  collections. 
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Subcontract  Expense 
Other  Direct  Costs 
Expenses  Subtotal 


0.00 

111,265.00 

0.00 

111,265.00 


3,017.00 

174,189.00 

316.41 

177,522.41 


Overhead 


5,247.43 


60,545.38 


Subtotal 


126,076.02 


351,143.91 


Fee  8% 


8,148.17 

134,224.19 


26,153.64 

377,297.54 


Total  Amount  Due  SIG 


